Working With Matrices

data science data types programming R

Working with matrices in R.

Danielle Brantley https://gist.github.com/danielle-b
01-23-2020

A matrix is a collection of elements of the same data type, with the data being arranged into rows and columns. Because it consists of both rows and columns, matrices are considered two-dimensional as opposed to vectors, which are considered one-dimensional.

DataQuest analyzes university rankings for this lesson. However, for this post, I decided that I would analyze six of the highest-grossing films of all time. The data that I’ll be working with in this post comes from Box Office Mojo. Please note that for the last two columns, Budget in Millions and Domestic Opening in Millions, I rounded the numbers so the values in these columns are not exact.

Combining Vectors into Matrices

To create a matrix using the data above, DataQuest taught me that I must first create vectors.

endgame <-c(1, 2019, 181, 356, 357)
avatar <-c(2, 2009, 162, 237, 77)
titanic <-c(3, 1997, 194, 200, 28)
star_wars <-c(4, 2015, 138, 245, 248)
infinity_war <-c(5, 2018, 149, 316, 258)
jurassic_world <-c(6, 2015, 124, 150, 209)

I can easily combine this vectors into a matrix using the function rbind(). The r in rbind() stands for row and this function allows us to combine multiple vectors and matrices by row.

film_matrix <-rbind(endgame, avatar, titanic, star_wars, infinity_war, jurassic_world)
film_matrix
               [,1] [,2] [,3] [,4] [,5]
endgame           1 2019  181  356  357
avatar            2 2009  162  237   77
titanic           3 1997  194  200   28
star_wars         4 2015  138  245  248
infinity_war      5 2018  149  316  258
jurassic_world    6 2015  124  150  209

Naming Matrix Rows and Columns

I then learned that I could name the rows and columns in a matrix. I could use the functions rownames() to name rows and colnames() to name columns. First, I stored the names of the columns into a vector called categories. I then used the function colnames() to assign those names to the columns in my matrix.

categories <- c("rank", "year", "runtime_minutes", "budget_millions", "domestic_opening_millions")
colnames(film_matrix) <-categories
film_matrix
               rank year runtime_minutes budget_millions
endgame           1 2019             181             356
avatar            2 2009             162             237
titanic           3 1997             194             200
star_wars         4 2015             138             245
infinity_war      5 2018             149             316
jurassic_world    6 2015             124             150
               domestic_opening_millions
endgame                              357
avatar                                77
titanic                               28
star_wars                            248
infinity_war                         258
jurassic_world                       209

Finding Matrix Dimensions

If I wanted to identify the dimensions (the number of rows and columns) in a matrix, I would use the dim() function. The output of this function gives me two numbers. The first number is the number of rows; the second number is the number of columns.

dim(film_matrix)
[1] 6 5

Adding Columns to Matrices

Earlier in this post, I combined vectors into a matrix using rbind() and it allowed me to combine my vectors by row. The function, cbind() allows me to combine vectors and matrices by column.

Let’s say I wanted to add the domestic gross of the films as a column to this matrix. First, I would get the domestic gross of the films. Next, I would use cbind() to add the domestic_gross_millions column to the existing matrix.

domestic_gross_millions<-c(858, 761, 659, 937, 679, 652)
cbind(film_matrix, domestic_gross_millions)
               rank year runtime_minutes budget_millions
endgame           1 2019             181             356
avatar            2 2009             162             237
titanic           3 1997             194             200
star_wars         4 2015             138             245
infinity_war      5 2018             149             316
jurassic_world    6 2015             124             150
               domestic_opening_millions domestic_gross_millions
endgame                              357                     858
avatar                                77                     761
titanic                               28                     659
star_wars                            248                     937
infinity_war                         258                     679
jurassic_world                       209                     652

I then stored the result in a new matrix called entire_matrix.

entire_matrix <- cbind(film_matrix, domestic_gross_millions)
entire_matrix
               rank year runtime_minutes budget_millions
endgame           1 2019             181             356
avatar            2 2009             162             237
titanic           3 1997             194             200
star_wars         4 2015             138             245
infinity_war      5 2018             149             316
jurassic_world    6 2015             124             150
               domestic_opening_millions domestic_gross_millions
endgame                              357                     858
avatar                                77                     761
titanic                               28                     659
star_wars                            248                     937
infinity_war                         258                     679
jurassic_world                       209                     652

When adding a vector to a matrix, it’s important to make sure that the new vector is the same length as the number of rows and columns in the matrix.

Indexing Matrices

Just as I indexed vectors, I learned that I could also index matrices. Since matrices are two-dimensional, they can be indexed in the following ways:

Indexing By Element

Let’s say I wanted to extract the year that Avengers: Infinity War was released. I have to specify the location of this element by row and and column. In the screenshot below, you can see that Infinity War is in row 5 and the year is in column 2.

entire_matrix[5,2]
[1] 2018

I can also index matrices by row and column names instead of position:

entire_matrix["infinity_war", "year"]
[1] 2018

I can specify the range of columns since the budget_in_millions and domestic_gross_millions columns are next to each other.

entire_matrix[5, 4:5]
          budget_millions domestic_opening_millions 
                      316                       258 

I can also index columns are not next to each other. Let’s say I wanted index elements from the columns rank and runtime_minutes. Here I index these columns in two ways. The first example is by position, the second example is by name.

entire_matrix[c(3,5), c(1,3)]
             rank runtime_minutes
titanic         3             194
infinity_war    5             149
entire_matrix[c("titanic", "infinity_war"), c("rank", "runtime_minutes")]
             rank runtime_minutes
titanic         3             194
infinity_war    5             149

Index By Row and Column

As mentioned, I can index to select a specific row or column. Let’s say I want to extract all the rankings for Avatar. All the rankings for Avatar are in row 2 of my matrix. I would indicate that I want to index all the elements of row 2 and leave the column position blank.

entire_matrix["avatar", ]
                     rank                      year 
                        2                      2009 
          runtime_minutes           budget_millions 
                      162                       237 
domestic_opening_millions   domestic_gross_millions 
                       77                       761 

When I write an expression to index an entire row or column, I only need to specify the name of that row or column. The other position is left blank. In this next example, I index an entire column. Since row comes before column, I leave the row blank.

entire_matrix[ , "budget_millions"]
       endgame         avatar        titanic      star_wars 
           356            237            200            245 
  infinity_war jurassic_world 
           316            150 

I could also index to select multiple rows and columns. If I want to extract the year, runtime_minutes and budget_millions columns, I would write:

entire_matrix[,c("year", "runtime_minutes", "budget_millions")]
               year runtime_minutes budget_millions
endgame        2019             181             356
avatar         2009             162             237
titanic        1997             194             200
star_wars      2015             138             245
infinity_war   2018             149             316
jurassic_world 2015             124             150
entire_matrix[,c(2,3,4)]
               year runtime_minutes budget_millions
endgame        2019             181             356
avatar         2009             162             237
titanic        1997             194             200
star_wars      2015             138             245
infinity_war   2018             149             316
jurassic_world 2015             124             150

If I want to extract the star_wars, infinity_war and jurassic_world rows, I would write:

entire_matrix[c("star_wars","infinity_war","jurassic_world"), ]
               rank year runtime_minutes budget_millions
star_wars         4 2015             138             245
infinity_war      5 2018             149             316
jurassic_world    6 2015             124             150
               domestic_opening_millions domestic_gross_millions
star_wars                            248                     937
infinity_war                         258                     679
jurassic_world                       209                     652
entire_matrix[c(4,5,6), ]
               rank year runtime_minutes budget_millions
star_wars         4 2015             138             245
infinity_war      5 2018             149             316
jurassic_world    6 2015             124             150
               domestic_opening_millions domestic_gross_millions
star_wars                            248                     937
infinity_war                         258                     679
jurassic_world                       209                     652

Ranking Films

I can use the rank() function to specify the categories I want to rank the films by. This function returns a vector of numeric values.

rank(entire_matrix[,"domestic_opening_millions"])
       endgame         avatar        titanic      star_wars 
             6              2              1              4 
  infinity_war jurassic_world 
             5              3 

Calculating the Sum Of Values in A Vector and Matrix

This last section of this post is going to cover calculating the sum of values in a vector and a matrix.

I can calculate the sum of the values in a vector or matrix using the sum() function.

Let’s recall the original vector I created called titanic.

titanic <-c(3, 1997, 194, 200, 28)

I want to add these values in the vector. To do that, I would write this:

sum(titanic)
[1] 2422

As you can see the sum of this vector is 2422. What if I wanted to calculate all the values of the titanic row of my matrix?

sum(entire_matrix["titanic", ])
[1] 3081

Here the sum of value in my titanic row is 3081. Why are the two sums different? Remember that I added the domestic_gross_millions column to my matrix after the matrix was created. The original vector does not include the value for domestic_gross_millions.

Just as I did the sum of the values in a row, I can do the same for a column. If I want to add up all the values in domestic_opening_millions column, I would type the following:

sum(entire_matrix[, "domestic_opening_millions"])
[1] 1177

So the sum of all the values in the domestic_opening_millions column is 1177. This means that combined opening weekend total for all the films is about $1,117,000,000!

This just about does it for matrices in R! For the next post, I’ll get into lists in R.

Citation

For attribution, please cite this work as

Brantley (2020, Jan. 23). Data Sci Dani: Working With Matrices. Retrieved from https://datascidani.com/posts/working_with_matrices 01-23-20/

BibTeX citation

@misc{brantley2020working,
  author = {Brantley, Danielle},
  title = {Data Sci Dani: Working With Matrices},
  url = {https://datascidani.com/posts/working_with_matrices 01-23-20/},
  year = {2020}
}